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Abstract 

We consider the problem of testing means from samples of two popu- 
lations for which the labels are not defined with certainty. We show that 
this problem is connected to another one that is testing expected values 
of components of mixture-models from two data samples. The underly- 
ing mixture-model is associated with known varying mixing-weights. We 
provide a testing procedure that performs well. Then we point out the 
loss of performance of our method due to the mixing-effect by comparing 
its numerical performances to the Welch's t-test on means which would 
have been done if true labels were available. 
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models. 

AMS Subject Classification: 62D05, 62F03, 62F05. 

1 Introduction 

In many cases, researchers can be interested in gathering information about two 
populations in order to compare them. In that setting, tests of significance are 
useful statistical tools for detecting a difference between two population param- 
eters. Related application fields are numerous. Some examples are genetics, 
neuronal data analysis, medicine, biology, physics, chemistry, social sciences, 
among other fields. 



Consider the Gaussian setting, for which each data of the two populations under 
study is assumed to follow a normal distribution. Let us recall that this assump- 
tion can be tested beforehand using a normality test, such as the well-known 
Shapiro- Wilk or Kolmogorov-Smirnov test, or it can be assessed graphically us- 
ing a normal quantile plot. Comparisons between the means of the populations 
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are usually carried out by using t-statistic and lead to the well known Student's 
t-test or Welch's t-tcst (see Welch [9]). 

These t-tests are popular because of their ease of use and their good perfor- 
mances. Moreover they are robust in the sense that they still perform well when 
the components are not really Gaussian, provided that the samples size are large 
enough. Nevertheless, these testing methods require to know the label of each 
data, that is the population each data is associated with. Unfortunately, re- 
searchers do sometimes not get this information. Indeed, one can imagine some 
cases where the labels of data are erroneous or uncertain, i.e. some data of each 
population do not deal with the population we want to compare. To give an ex- 
ample of such a situation with lack of information, we consider two populations 
- New York and California people - reduced to people that take bus/trolley bus 
or walk to go working. Focusing on working people that take bus/trolley bus, 
suppose you are interested to know whether travel time of people from New 
York is significantly different to the one from California from a sample of people 
where the place they live - New York or California - is available but the way of 
travel (the label) associated with each data in hand is not. 
This kind of situation is the one we are interested in. Indeed, we want to ad- 
dress the problem of testing means of (sub)populations when the labels of data 
are uncertain. More precisely, we first propose to show that this testing prob- 
lem can be reformulated as testing the expected values of components from two 
samples of independent mixture variables. In our study of real data, we shall 
assume that the mixing-weights are known. It means that proportions of people 
walking or using bus/trolley bus for each population (New York and California) 
are known, with respect to an auxiliary variable (age for instance). Then, we 
provide a testing procedure that takes into account this information on popu- 
lations - and we discuss about its performances. 

The testing procedure we propose is directly inspired from ideas in Autin and 
Pouet [T] . In this previous work, a nonparametric procedure has been proposed 
to test whether the densities of two independent samples of independent ran- 
dom variables result from the same mixture of components or not. The value 
of the test statistic requires to invert in some sense the mixing-weights opera- 
tors of samples (see Definition [lj as a preliminary step to be calculated. This 
testing procedure was proved to be powerful since it is minimax over Besov 
spaces (more details are given in paragraph 3.1 in Autin and Pouet p]). More 
focusing on practical purposes, we show that providing a testing procedure that 
incorporates combinatory ideas - provided that the mixing-weights are known - 
is quite relevant compared to a procedure usually used in classification. 

Paper is organized as follows. In Section [2] we present the mixture-model we 
are interested in. Connection between the testing problem for which the labels 
of data are not certain and the problem of testing the expected value of the 
components involved in the mixture- model is provided. In Section [3l we present 
three testing procedures. The first one is the Oracle Procedure that uses Welch's 
t-test on data associated with the label of the components we want to test. 
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Of course this procedure is not tractable for the testing problem with lack of 
information about labels but it will be used as a benchmark to assess the loss 
of performances of the other tractable testing procedures. The second testing 
procedure we present is the Expert Procedure that uses Welch's t-test on data 
that are supposed to have, with probability larger than or equal to one half, 
the label associated with the components we focus on. The third and last 
procedure, namely the Mixing Procedure, uses combinatory properties leading 
to a new performing test. Section U deals with numerical experiments to point 
out the good performances of the Mixing Procedure - the one we suggest - 
compared to the Expert Procedure and to assess the loss of performances due 
to the mixing-effect compared to the Oracle Procedure. An application to real 
data is also presented whereas a brief conclusion is postponed in Section O 
Finally, the technical lemmas and the proposition we used to prove our main 
theoretical result (see Theorem [I) together with their proofs can be found in 
the appendix. 

2 Model description and hypothesis testing prob- 
lem 

2.1 Mixture-models with varying mixing- weights 

Let X\, . . . , X n be independent random variables such that, for any 1 < i < n, 
the density of Xi on K, denoted by f x . , is a mixture density with components 
Pi and P2 and mixing- weights uj\(i) and (^2(2), i.e. 

f x , = Wi(£)pi +uJ2(i)p2- 

We also introduce labels attached to X\, . . . , X ni denoted by m,. . . , u n . This 
point of view is one interpretation of mixture-models among others (see Section 
1.4 in McLachlan and Peel [3]). The main difference lies in considering vary- 
ing mixing-weights in our model. This point is very important (see Autin and 
PouetQ]). Therefore our model cannot be described as a mixture-model in the 
usual sense. 

Similarly to the sample X\, . . . ,X n , we consider a sample of independent ran- 
dom variables Yi, . . . , Y n > such that, for any 1 < i < n the density of Yi on M, 
denoted by f Y , is a mixture density with components p' t and p' 2 and mixing- 
weights uj[(i) and w 2 (*)i i.e. 

/V, = UJ ' 1 {i)p' 1 +uj' 2 {i)p' 2 . 

We also introduce labels attached to Y\ , . . . , Y n , denoted by v\,...,v n and we 
assume that this second sample is independent from the first one. 

If *. denotes the transpose operator, the two mixture- models we have just 
introduced can be rewritten in a simpler way as follows: 

f x=^xP and f Y =^vP'> (!) 
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Table 1: Populations weights (and sizes) with respect to age 





Bus/trolleybus 


Walk 


New York over 21 y.o. 
New York under 20 y.o. 


51.93% (4313) 
34.65% (306) 


48.07% (3993) 
65.35% (577) 


California over 21 y.o. 
California under 20 y.o. 


57.4% (4479) 
42.77% (497) 


42.6% (3324) 
57.23% (665) 



with, 

ifx 1 i ■ • ■ ) fx n )•> i,fy 1 ) * * • 5 fv n )) 

- p = *(pi,P2), p' = t ipi,Pi), 

- n x = M*)) M , n Y = {u[{i)) iV 

Definition 1 The matrices O x and Q, Y involved in the model (QP are called the 
mixing-weights operators. 



Definition 2 Any mixture-model (QP such that Q x and D, Y are full rank ma- 
trices is called mixture-model with varying mixing-weights. 

2.2 Example of modeling with mixture-models 

Let us illustrate this theoretical set-up with the example cited in the introduc- 
tion. The random variables X\ , X n correspond to the travel times of people 
in the state of New York and the random variables Yx,...,Y n to travel times in 
the state of California. The labels are the ways of transportation to go working 
and can be either Bus/trolley bus (label 1) or Walk (label 2) . The last step to 
complete the mixture model is to describe the mixing-weights for each observa- 
tion. In each state the mixing- weights strongly depend on the age (over 21 or 
under 20 years old (y.o.)). Table [T] illustrates this fact. 
This table leads to the following mixing-weights: 

uJi{i) = 0.5193, uj 2 {i) = 0.4807 

if the person i is over 21 y.o. and lives in New York, 
= 0.3465, uj 2 {i) = 0.6535 

if the person i is under 20 y.o. and lives in New York, 
u[(i) = 0.574, Lo' 2 {i) = 0.426 

if the person i is over 21 y.o. and lives in California, 
Lu[(i) = 0.4277,^(0= 0.5723 

if the person i is over 21 y.o. and lives in California. 
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The reader can legitimately wonder why the age is assumed to be known and 
not the ways of transportation to go working. One can think about at least 
two good reasons. The first one is an a priori reason. The survey would be 
rather lengthy if all interesting variables were included. Therefore the survey 
is restricted to a small set of informative variables strongly linked with the 
interesting variables. Moreover these informative variables can be chosen as 
objective as possible and easily recordable. This can be called planned missing 
values (see Graham [3]). The other reason is an a posteriori one. During the 
data analysis of a survey, researchers are often confronted with new hypotheses 
to test. In many situations, the relevant variables have not been recorded and 
researchers have to plan a new survey which includes these new variables in 
order to check these hypotheses. This leads to a waste of time and money. 
Our testing problem on means from data with undefined labels can be associated 
with the testing problem in the mixture- model (Q]). Indeed, it corresponds 
to a testing problem on means for which labels of data are unavailable: the only 
information on the X^s label (resp. Y^'s label) is the probability (resp. 

that it corresponds to Z, for any I £ {1,2}. In other words, the added 
information on subpopulations we get is the knowledge of the mixing-weights 
operators. 

2.3 Hypothesis testing problem 

We recall that two data samples X = t (Xi, . . . , X n ) and Y = *(Yi, . . . , Y n ) arc 
considered. For a chosen label I £ {1,2}, we are interested in testing whether 
components p\ and p\ have the same expected value or not. We want to address 
this problem in a general context that is: the parameters of variance o\ and 
of the components pk and p' k are unknown whatever k £ {1,2}. 

For a fixed I £ {1,2}, when respectively denoting by mi and m! l the expected 
value of the components we focus on, the testing problem we consider is lying 
on the two following hypotheses: 

the null hypothesis Ho : m; = mj, 

(2) 

the alternative hypothesis T-L\ : mi ^ m[. 

We recall that providing a procedure to solve the testing problem ([2]) means 
giving a decision rule (or test) A £ {0, 1} that relies on the value of a measur- 
able function T (test statistic) of X\, . . . , X n and Y\, . . . , Y n . 

As usual, A = 1 will mean deciding Hi whereas A = will mean deciding Hq. 



3 Description of testing procedures 

In this section we introduce the testing procedures we are interested in. 
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3.1 Oracle test: A 



The first testing procedure we present is called the Oracle Procedure. This 
is a two steps procedure. First step consists in recovering the true labels of 
data. Second step lies on using the Welch's t-test on data with label I in order 
to know whether mi and m\ can be judged as different. This test cannot be 
used in our context where the true labels are unknown but it will be used as a 
benchmark when comparing the performances of the other testing procedures. 
It corresponds to the procedure proposed by the oracle: any statistician having 
information on labels. 



Here we describe into details the Oracle Procedure. Let us denote by 

n n 

- ni = ^2 l{m = 1} and n[ = l{v t = I}, 

i=i »=i 

n n 
i=l i=l 

- af = -J2(Xi-X^) 2 l{ Ui = l} and a' 2 = -i ^(Y, - Y^) 2 l{v t = I}. 

i— l L i—i 

The Oracle test A D lies on the test statistic T a defined as follows 

LY(0_y(Q| 

J* + % 

Under the null hypothesis, the asymptotic law of T Q is known to be the Stan- 
dard Gaussian one, namely Af(Q, 1). Hence, A D = 1{T > q r } is a test with 
asymptotically type / error equal to r (0 < r < 1), where q r is the quantile of 
order 1 — £ of the Standard Gaussian law. 



3.2 Expert test: A e 

The testing procedure we describe now is lying on a method used in classifica- 
tion. It is a two steps procedure. The first step consists in allocating label I to 
any data Xj such that u>i(i) > \ and to any data Yj such that oj[{j) > |. The 
second step consists in using the Welch's t-test on the two subsamples of data 
that have been assigned to label / to know whether mi and m[ can be judged as 
different. Notice that it means that the Welch's t-test is done on data having 
possible wrong labels. 



Put 

n _^ n 1 

- = E ^ 2 } and n '- = ? - 2 } ' 
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= ^ttE^- 1 ^ 5} and Fe(0 = ^E^- 1 ("i'W ^ 5}. 

i=l ^ ' »=1 ^ ^ 

«t = ^-E(^-Wi{«i(0>i}, 

I,e j =1 I J 

< 2 e = j-E(^-n (i) ) 2 l{^)>^ 

I.« i=l ^ 



2 

The Expert test A e relies on the test statistic T e defined as follows 



\xP-yP\ 



T, 



Then, the decision rule is done by putting A e = l{T e > q r }. 



3.3 Mixing test: A m 

The last testing procedure we propose is inspired from some ideas provided in 
Autin and Pouet jTj. Using combinatory methods, it proposes to invert in some 
sense the mixing-weights operators so as to provide a new test that will be 
proved to perform well. Let us describe this new testing procedure into details. 

Let us denote by A x and A Y the matrices with n lines and 2 columns satisfying 

t n x A x = t n Y A Y = (^ °). (3) 

Notations : For any (i, I) £ {1, . . . , n} X {1, 2}, we denote respectively by ai(i) 
(resp. a'i(i)) the entries of A x (resp. A Y ) associated with line i and column 

Following Maiboroda [5] or Pokhyl'ko [fj , solutions of equations (|3|) are given 
by 

where and 7^ are respectively the minor (I, k) of the matrix t Vt x Q. x and 
of the matrix 'f2„0„. 
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For any I e {1, 2}, (mi, m[) can be estimated by the method of moments when 
using estimators (m;,mj) defined as follows: 

{mi,m\) = [(A^,X) n ,(A^,Y) n ) 

/ n n \ 

:= -5>(i)X if -5>Ki)*i , 
V n z — ' n f — ' / 

\ i=l i=l / 

provided that and A^> respectively denote the l-ih column-vector of ma- 
trices A x and A Y . 

The Mixing test A m lies on the test statistic T m defined by: 

T m := (4) 



where V„ is the estimated variance of rhi — m',, that is 



1 n 

^« = "J E [ofW ( X « - w i(0»&j - W 2W™0 2 + aft*) PS - wiW^i - wi(i)m{) S 
i=i 

Remark 1 .As discussed in Autin and Pouet f^, for any I £ {1,2} the random 
variable rhi (resp. rh\) is a good estimator for mi (resp. m\). Hence if the 
distance between mi and m[ is judged too large, the rejection of the null hypoth- 
esis Ho looks better. This idea motivates the choice of the test statistic T m we 
defined above. 

Under the null hypothesis Ho, the asymptotic law of T m is known, according to 
the following Theorem. 

Theorem 1 Let I € {1,2}. Assume that 

• the components within the mixture-model fip have moments with order A, 

• the mixing-weights of the mixture-model £/p are such that 



sup af(i) sup a ; 2 (i) 
^rf = Jim. = 

E« 2 « E^ 2 « 



Then, under the null hypothesis Ho, the law ofT m is asymptotically the Standard 
Gaussian one, i.e. 

r m 4w(o,i)- (6) 



Hence, A m = l{T m > q r } is a test with asymptotically type / error equal to r 
(0 < r < 1). 

Remark 2 A wide range of mixing-weights of the mixture-model satisfy the 
condition f5|). Examples of such mixing-weights are given in (|12[) of Section^ 

Proof : 

To prove Theorem [T] notice that it suffices to prove that for any I g {1,2} 
1 " 

- ai (i)Xi 

71 * 



mi 



1. 



\ 



1 

1 ™ 



4jV(o,l), 



3=1 



\ 



1 

® ^ ~ W 1^')^1 - W 2W^2)" 



■AT(0,1). 



3=1 



Because of the independence between the two samples and the fact that under 
the null hypothesis Ho, "i; = ml- Since these two results of convergence can be 
proved by an analogous way, we only focus on proving the first one that can be 
rewritten as follows for any I 6 {1,2}: 



ai(i) (Xi - wx(i)mi 



W2U)m2j 



Af(0,l). 



^ af(i) (Xi - wi(i)mi - uj 2 (i)m 2 Y 



Denote, for any n £ N*, any I £ {1,2} and any 1 < i < n 

n 

B n = J2 a ^ f > E [^ Xi ~ UJl ^ mi ~ U ' 2 ^ m ^ 
i=l 
n 

= ^ af(i) (Xi - uj 1 (i)m 1 - u 2 (i)m 2 ) . 
?:=i 

From Proposition [TJ 

n 

^ ai(i) (Xi - wi(i)mx - w 2 (i)m 2 ) 



(7) 

(8) 



A^(0,1) 
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parameter B$ by the estimator B^P . In other words, the result we want to prove 



In the sequel, we aim at proving that a same kind of result holds when replacing 
parameter B$) t 
is the following: 

n 

^ ai(i) (Xi - LJi(i)mi - uj 2 (i)in 2 ) 



AA(0,1). 



(9) 



From Slutsky theorem, it suffices to prove that estimator B$ of Bn is consis- 
tent. We propose to divide the proof of this consistency into two steps. First 
we prove 



^ af(i) (Xi - wi(i)mi - co 2 (i)m 2 Y 



i=i 



Proba 



B 



(0 



1. 



(10) 



The second step consists in replacing m\ and m 2 by their consistent estimators 
rhi and rh 2 and in checking that the convergence in probability still holds. 

From this point we need more assumptions, that is to say the existence of the 
fourth order moment for pi and p 2 . 

Let us prove the first step. We apply Bienayme-Chebyshev inequality, for any 
e > 0: 



P ^a 2 (z) (Xi - wi(i)mi - u 2 (i)m 2 f - B® 

V i=i 

n 

af(i)Var ((Xi - Wi(i)mi - ui 2 (i)m 2 



> 



B« £ 



< 



< 



< 



< 



i=l 



(B^sf 

n 

$>?(i)E [(X l -E(X l )) i 

j=i ' 

(B^ef 

n 

sup af(j) ^2af( , i)C(m 1 ,m 2 ,p 1 ,p 2 ) 

J=l,...,n i=l 



(/) 



B ; e 2 



sup a 2 (j) , 2 2 r i 
j'=i,...,n fmin(crf , erg)) C (mi, m 2 , Pi,P2j 



(/) 



Last inequalities are obtained by using Lemma [4] and Lemma [2] The right part 
of the last inequality is the product of two terms. The left one tends to when 
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n goes to infinity because of assumption (|5|). The right one is a constant that 
only depends on e and the parameters of pi and p 2 ■ When considering the limit 
in infinity with respect to n, we conclude that property (|10[) holds. 

We end by proving the second step. We have 

n 

af{i) (X, - wi(i)mi - w 2 (i)m 2 ) 2 

i=l 
n 

= ^ a 2 (a) (Xj - w\(i)mi - u 2 (^)m 2 ) 2 

i=l 
a 

+2^^a^{i)(Xi-uj 1 (i)mi~U2{i)m 2 ){ijJi{i){m 1 -mi) + w 2 (i)(m 2 -m 2 )) 
i=i 

n 

+ ^a 2 (i) (tJi(i)(mi - mi) + w 2 (i)(m 2 - m 2 )) 2 . 
i=i 

The first term is exactly the one appearing in the first step and also converges 
to 1 in probability when divided by . We turn to the second term. Cauchy- 
Schwarz inequality entails that 

^af(i) (X i -w 1 (i)m 1 -uj2(i)m2) (wi(i)(mi-mi) + w 2 (i)(m 2 -m 2 )) 



< 



. ^ af(i) (Xj - ui(»)mi - Lu 2 (i)m 2 y 
\ <=i 



2(mi - mi) 2 V" a 2 (i)w 2 (i) + 2(m 2 - m 2 ) 2 a 2 (i)w|(i). 



i=i 



When divided by B« , the first term of the righthand-side converges to 1 in 
probability: it is the result of the first step. By using Lemma [2] one gets 

(n n \ 

£a?(fy;?(i),X>?(i)wf(i) < B« (min(a 2 , a 2 ))" 1 . 

Hence the second term of the right-hand side of the inequality converges to 
in probability when divided by B„ ^ because of the consistency of estimators m; 
(see Lemma [S]). Hence second the term we are interested in converges to in 
probability when divided by Bn\ We can proceed in the same way in order to 
prove that the third term converges to in probability when divided by B$. 

So, we have just proved that 

-E>n 



11 



We conclude that the exact variance Bn can be replaced by the consistent es- 
timator Bn for the result of convergence. In other words, the property ([9]) holds. 



4 Numerical experiments 



4.1 Numerical performances of the Mixing-test 

In this section wc provide numerical experiments and we discuss about the 
performances of our testing procedure. What we often expect is a gain of per- 
formance of the test A m - that is to say a smaller type II error when the type 
/ error is chosen to be r — 0.05 - comparatively to the test A e . Without loss of 
generality, we suppose that n is even. 

Wc consider the Gaussian setting and we assume in this section that the mixing- 
weights operators Q x and tt Y have the following form: 



0, 



a 



1 



A 



a 1 — a 



( 



and fi„ = 



a 



1 



A 



a' 1 - a' 

i-p' p 



(12) 



where ^ data from X (resp. Y) deal with the couple of mixing- weights (a, I — a) 
(resp. (a', 1 — a')) and the other 2 data from X (resp. Y) deal with the couple 
of mixing-weights (1 — /?,/?) (resp. (1 — j3' , /?')). Suppose now that our testing 
problem is dealing with the first component, i.e. I = 1 and that fi^ and Q Y are 
full rank matrices, i.e. a + f3 ^ 1 and a' + /3' ^ 1. 



4.1.1 Mixing-test versus Expert-test 

In this paragraph we provide a motivation for the use our testing procedure 
A m . For the sake of simplicity we suppose that a = (3 and that a' = j3'. For 
any value of (a, a') s]^, 1[ 2 , there are many situations where the performance 
of the Expert test is quite bad even if the numbers of observations n is large. 

• Dealing with two components with equal expected value, A e can most 
of time detect a difference between these components (wrong decision) 
whereas our test doesn't. For instance, suppose that mi = vn! x and that 

toJ, i s l ar ge away from 7772 as a = a'. Since E (^Xe 1 ^ 7^ E (y^ 1 ^ , using 
A e to detect equality between components mi and m'j would be a very bad 
choice in that context. For 77 large enough, it would imply that T e > t r 
with high probability. Hence, the wrong decision Hi may often be done. 



An example of such a situation is given here in the case where a = a' = 0.9. 
Consider the testing problem ([2]) and suppose a± = a[ = 02 = er 2 = 1 and 
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Table 2: Percentage of wrong decisions by A e 



S 1 n 


100 


200 


500 


1000 


2000 


0.5 


0.057 


0.064 


0.086 


0.121 


0.191 


1 


0.074 


0.098 


0.172 


0.302 


0.521 


2 


0.126 


0.210 


0.462 


0.749 


0.963 


3 


0.188 


0.350 


0.722 


0.950 


0.999 



Table 3: Percentage of correct decisions by A 



n 


500 


1000 


2000 


3000 


4000 


5000 


6000 


A m 


0.146 


0.242 


0.438 


0.595 


0.718 


0.810 


0.879 



that m\ = m'i = 0, mi = 1 and m' 2 = 7772 + (5. For varying values of n, 
5 and 40 000 repetitions of A e with r = 0.05, we give the percentage of 
wrong decisions "Hi in Table [5J 

Notice that the percentage of wrong decisions by A e turns up as n grows 
up and can be quite important if m' 2 is sufficiently far away from 777,2- 
Most of time, the expert detects a difference between the components m\ 
and m\ but there is not in that context. Comparatively speaking, the 
percentage of wrong decisions by A m is around 0.05. 

• Most of time A e fails to detect a difference between two components with 
different expected value whereas our test doesn't. For instance, suppose 
that mi 7^ fn'i and that 

?772 ~ (1 — a)" (ot'm\ + (1 — a')m 2 — omii) . 

Since 

77 

E(Xi) w E(Yi), for any 1 < i < -, 

using A e to detect the difference between mi and 777^ would be a very 
bad choice in that context. Indeed, according to the law of large numbers, 
with high probability - that increases as 77 goes up - Xg and Ye would 
be very close to each other. It would imply that T e < 1.96 with high 
probability. True decision Hi would be taken only in 5 % of cases. 

An example of such a situation is given here in the case a = a! = 0.9. 
Consider the testing problem ([2]) and suppose that o\ = a[ = a 2 = o' 2 = 1 
and that 7771 = 0, m[ =0.1, 777,2 = 1 and m' 2 = 2. For varying values of 
77, and 40 000 repetitions of A m with r = 0.05, we give the percentage of 
correct decisions from A m in Table [3] 

As expected, the percentage of correct decisions by A m goes up as 77 grows 
up. But it is not the case for the percentage of correct decisions by A e 
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Tabic 4: Empirical Power of A Q and A. 



Test / n 


500 


1000 


2000 


3000 


4000 


5000 


6000 




0.200 


0.349 


0.609 


0.783 


0.886 


0.942 


0.973 


A m 


0.149 


0.245 


0.427 


0.585 


0.704 


0.798 


0.868 



Table 5: Empirical Power V m of A TO for varying a = a' 



6 1 a 


0.6 


0.65 


0.7 


0.75 


0.8 


0.85 


0.9 


0.95 


1 


0.1 


0.071 


0.092 


0.125 


0.161 


0.196 


0.239 


0.275 


0.317 


0.353 


0.2 


0.130 


0.226 


0.353 


0.484 


0.603 


0.700 


0.780 


0.837 


0.883 


0.3 


0.238 


0.446 


0.659 


0.816 


0.912 


0.960 


0.983 


0.999 


1 


0.4 


0.374 


0.671 


0.882 


0.938 


0.993 


1 


1 


1 


1 



which are always around 0.05. Most of time, the expert is unable to detect 
the difference between the components in that context. 

Finally we conclude that it is better to choose A m for the problem we are 
interested in. 

4.1.2 Mixing-test versus Oracle-test 

In this paragraph we compare the empirical powers of A m to the Oracle test 
A D ones, when taking r = 0.05 and the same parameters as the last example. 
We recall that the empirical power of any test A corresponds to the numerical 
evaluation of the probability to correctly decide "Hi, according to A. 

According to Table 2] we remark that the bigger n the better the powers of A Q 
and of A m . Moreover we note that the empirical power of the Mixing test is 
not bad when comparing to the Oracle test. 



In Table [5] we give the empirical power V m of A m measured in samplings of size 
n = 1000 in the case where mi = 0, m\ = 5, mi = 1 and m' 2 = 0. 

As expected, looking at Table [5l 

• quantity V m depends on the intrinsic difficulty of the problem. Indeed the 
larger the absolute value of quantity 5 := m' 1 — mi, the easier the problem 
of detection and so the more powerful the test, 

• the larger the degree of certainty a the better the power of A TO . This is due 
to the fact that the expectation of the number of wrong labels considered 
by the Expert Procedure grows up as a goes down. 
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Table 6: Percentage of correct decisions by A 



(a, a'), 1 n 


100 


200 


500 


(0.90,0.60) 


0.153 


0.254 


0.533 


(0.80,0.70) 


0.311 


0.536 


0.896 


(0.75,0.75) 


0.332 


0.564 


0.918 



4.1.3 Comparisons on performances of A m for varying values of 

(a, a') 

As previously discussed, we expect that the better the degree of certainties 
of the expert, the better the performance of the test A m . This statement is 
highlighted here when considering the same parameters as before, S = 0.5 and 
many choices of couple (a, a'). For each choice of (a, a') done, we provide in 
Table [6] the empirical power V m of our test A m that is the percentage of correct 
detection of a difference between wi\ and m! x . 

Interpretation of the results presented in Tablej6]goes in the same way as Autin 
and Pouet [jQ: the bigger the smallest eigenvalue of both operators *£2 x f2 x and 
'(lyflj, that is A min = | (1 — 2 min(a, a')(l — min(a, a'))) tne better the power 
of our test A m . Note that, the larger the minimum value between a G]^, 1[ and 
a' 1[ the bigger \ min - 

4.1.4 Brief conclusion 

Let us summarize the main facts. First, in some cases experts can be completely 
wrong because of the overall design, that is to say the link between the means of 
the components and the mixing-weights. This is a serious issue for the Expert 
test. The results become worse and worse as the sample size increases. The test 
adapted to the varying mixing-weights that we propose does not suffer from 
this drawback. The second fact is the good behavior of our test compared to 
the Oracle test. Although it is behind, the power is quite acceptable. The 
last important fact which has already been stressed by Autin and Pouet [1] is 
the effect of the mixing- weights. It is known a priori thanks to the smallest 
eigenvalue of the operators t Tl x fl x and t fl Y fl Y . This point is important as the 
statistician can act in order to counter to this effect, e.g. he can improve the 
accuracy of the expert system giving the mixing- weights or increase the sample 
sizes. 

4.2 Application to real data 

In this section we apply our methodology to real data and we discuss about the 
results. 
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Tabic 7: Description of the population 





NY 


CA 


Total 


9189 


8935 


Over 21 
Under 20 


90.39% (8306) 
9.61% (883) 


87.04% (7803) 
12.96% (1162) 


Walk 
Bus/trolley bus 


49.73% (4570) 
50.27% (4619) 


44.5% (3989) 
55.5% (4979) 



Table 8: One-way analysis of travel time (in minutes) 





Walk 


Bus/trolley bus 


Walk and Bus/trolley bus 


NY 
CA 


12.25 (12.18) 
11.23 (12.23) 


47.26 (28.79) 
45.12 (28.84) 


29.85 (28.23) 
30.04 (28.49) 



4.2.1 Description of the data 

We have selected data from U.S. Census Bureau website, more precisely PUMS 
2006 (see [5]). We are interested in comparing travel time of people living either 
in the state of New York (abbreviated in NY) or either in the state of California 
(abbreviated in CA). Two ways of transportation have been kept: Bus/trolley 
bus and Walk. We have also kept a variable linked to age as it will be useful for 
the mixture-model with varying mixing weights. This variable records the fact 
that a person is over 21 years old or under 20 years old. 

Here are few facts to roughly describe the PUMS sample. Table[7]gives one-level 
information. 

In Table [8] we compute the mean and the standard deviation (in parentheses) 
of the travel time according to the categorical variable means of transportation 
to go working. 

As it can be seen in Table [8] there might be no difference between New York 
and California. Nevertheless if the means of transportation is unavailable, it 
will be perilous to decide when considering the whole sample without any other 
information. Indeed as shown in Table the difference between New York and 
California is decreased because of the structure of the population (less people 
under 20 years old in New York). 

4.2.2 Methodology 

We assume in the sequel that the information about the way of transport (la- 
bels) are unavailable at the microdata level. We are going to apply the test A m 
adapted to the varying mixing-weights mixture-model. The age variable is the 
only auxiliary information available at the microdata level that permits to get 
the mixing- weights to our mixture-model ((T|). 
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Table 9: Bus/trolleybus decisions 





Decision n = 1000 


p- value 


Oracle test 


not rejected 


0.24 


Expert test 


not rejected 


0.42 


Mixing test 


not rejected 


0.11 


Table 10: Walk decisions 




Decision n = 1000 


p- value 


Oracle test 


not rejected 


0.23 


Expert test 


rejected 


0.04 


Mixing test 


not rejected 


0.48 



For comparison purpose we have also applied the so-called Expert test. The 
type I error is chosen to be 0.1. 

According to the notations we introduced in (fT2|) and to Table [TJ 

(a, /3) = (0.5193, 0.6535) {a 1 , /?') = (0.574, 0.5723). (13) 

We consider the following sample: 500 persons over 21 and 500 persons under 
20 were randomly sampled in each state (n = 1000). 

We applied three testing procedures: 

1. Oracle test, 

2. Expert test, 

3. Mixing test. 

First we test the equality of the averages when the ways of transportation to 
work is Bus/trolley bus (label 1) in Table [H In this case, the other means of 
transportation to work is considered as a nuisance parameter. 
Next we reverse the set-up. We test the equality of the averages when the means 
of transportation to work is Walk (label 2) in Table [TU1 Bus/trolley bus is now 
a nuisance parameter. 

4.2.3 A tough situation 

Here we are also interested in comparing travel time of people living either in the 
state of New York or either in the state of Illinois (abbreviated in IL) . Data come 
from U.S. Census Bureau [5]. Two ways of transportation to work have been 
kept: Bus/ trolley bus or Railroad. We have also kept the gender variable as it 
will be useful for the varying mixing-weights mixture-model. As it will be seen, 
the situation is much more involved compared to the one in the previous section. 
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Tabic 11: Description of the population 





NY 


IL 


Total 


6974 


2899 


Men 
Women 


46.5% (3247) 
53.5% (3727) 


48.2% (1398) 
51.8% (1501) 


Bus/trolley bus 
Railroad 


66.2 % (4619) 
33.8% (2355) 


58.4% (1692) 
41.6% (1207) 



Table 12: Mixing-weights 





Bus/trolley bus 


Railroad 


NY men 
NY women 


55.8 % (1813) 
75.3 % (2806) 


44.2% (1434) 
24.7% (921) 


IL men 
IL women 


50.8 % (710) 
65.4 % (982) 


49.2% (688) 
34.6% (519) 



Here are few facts to roughly describe the PUMS sample. Table [TT] gives one- 
level information. 

The mixing- weights depend on the gender as illustrated in Table 
According to the notations we introduced in (fT2]) and Table [12] 

(a, 0) = (0.558, 0.247) (a', 0) = (0.508, 0.346). (14) 

In Table [13] we compute the mean and standard deviation (in parentheses) of 
the travel time according to the categorical variable way of transportation to 
work. 

Once again the difference in travel time is decreased if we consider the entire 
population. This is due to its structure. As there are more men and women 
who use railroad in Illinois, the general average of travel time is increased. This 
is reverse in New York. 

In Table Q3] we test the equality of the averages when the ways of transportation 
to work is Bus/trolley bus. 

In Table [15] we reverse the set-up and we test the equality of the averages when 
the way of transportation to work is Walk. 



Table 13: One-way analysis of travel time 





Bus/trolley bus 


Railroad 


Bus/trolley bus and Railroad 


New York 
Illinois 


47.3 (28.8) 
41.8 (26.4) 


71 (30) 
63.1 (25.7) 


55.3 (31.3) 
50.7 (28.2) 
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Table 14: Bus/trolleybus decision 





Decision n = 1000 


p- value 


Oracle test 
Expert test 


not rejected 
not rejected 


0.19 
0.13 


Mixing test 


not rejected 


0.75 



Table 15: Walk decision 





Decision n = 1000 


p- value 


Oracle test 
Expert test 


rejected 
non-available 


0.05 

non-available 


Mixing test 


not rejected 


0.12 



5 Conclusion 

From our point of view, one of the most interesting point is the usefulness of the 
varying mixing-weights model. It is a versatile model that can be used in many 
situations with missing microdata but aggregated information. The application 
treated exemplifies the modeling. 

The second take-away message is the excellent performances of the Mixing test 
we propose. They can be guessed a priori thanks to the smallest eigenvalue 
of operators involved within the mixture-model. These nice performances were 
showed both theoretically and numerically. 

To conclude let us precise that this work can be easily extended to mixture- 
models with more than two components and can be done in a nonparametric 
setting when using the testing procedure proposed by Butucea and Tribouley 
[2] as the Oracle test and the one given by Autin and Pouet [T| as the Mixing 
test. 

An interesting extension that should really be considered is the case of mixing- 
wcigths with errors. This arises when mixing-weigths are computed from a 
model with estimated parameters or from experts' evaluation. In this case the 
solution of ((3|) is no longer exact as the matrices fix and Qy are random. 
Preliminary simulation results tend to prove that moderate errors have a small 
effect. 

6 Appendix 

In this section we provide the technical lemmas and the proposition required to 
prove the asymptotic normality under interest. For the sake of simplicity, we 
present them with respect to X\ , . . . , X n whereas an analogous version of them 
does exist for Y\, . . . , Y n . We recall that we assume that, for any I e {1,2} the 
mixing- weights of the model satisfy ([5]). 
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Denote, for any n G N*, any / G {1, 2} and any 1 < i < n 

W% = -^L(X 4 -o;i(i)mi-w 2 (;)m 2 ). (15) 
R (0 



Lemma 1 For any 1 < i < n 

2 



Var(Xi) = ^ ui[(i)(x ~ m[) 2 pi(x)dx^) + uji{i)u 2 {i){m 1 + m 2 Y 
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Proof: 



For any 1 < i < n, 

Var(Xj) = (x - u>i(i)mi + Lu 2 (i)m 2 ) 2 (u)ipi(x) + cu 2 p 2 (x))dx 



E f / wj(i)(a; - mi) 2 pi(x)dz 
1=1 ^ 



x) + wi(i)w 2 («)( m i + m 2) ■ 



From Lemma [T] we immediately derive: 

Lemma 2 For any I 6 {1,2}, iei £?„ be defined as in 

n 
i=l 

Proof : 

For any I S {1,2}, by using Lemma Q] and the fact that, for any 1 < i < n 

w\{i) + uj 2 (i) = 1, 



= E 

i=l 
n 

> E 



af(i) IE / ^'W^ _ mi) 2 pi{x)dx j +w 1 (i)w 2 (j)(mi +m 2 ) 2 



v2=l 
2 



min I (x — mi) pi(x)dx, (x — m 2 ) p 2 (x)dx 



min (cr J, cr|)E 



Lemma 3 Fo?- any f £ {1, 2} Let B$ and (1 <i <ri) be defined as in |[7j) 
and U5\) . Then, for any e > 



r 9 

^ E E ( W S) i{\w£\>z} 



i=i 



0. 



Proof: 



Fix i e {1, 2}. Let us define for any neF 
k„ = min{mi,m2}-f 



K n = max{mi, 777,2} 



c ,/d(0 

supi{|ai(i)|}' 
supi{|oj(i)l}' 
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n r „ 

E E KO l{|Wi°|>e} 

i=l L 



< 



E 



(a; — u>x{i)mx — w 2 (*) m 2) 2 dF x (x) 



n 1 1 ■ \ 

„(i) 



E w «(*) 3 / (x - mi) 2 pi(x) dx 
«e{i,2} 



+ w 2 («)wi(i) / ~ TO 2) Pi(a;) da; + o; 1 (i)aj 2 (i) / (a; - mi) p 2 (x) da; 

Jx>K n ,X<K f Jx>K n ,X<K' 



< 



2 £ 



a?(i) 



i=i # 



(0 



) »p ( E / o 



a; — mfc) (ix 



Using Lemma [H for any n S N*, 

Then, since variances under pi and p 2 arc finite, the suprcmum over n tends to 
in the integrals above, according to Lebesgue dominated convergence theorem. 

Lemma 4 For any 1 < i < n, 

-E(Xi)) 4 ] <C(m 1 ,m 2 , Pl ,p 2 ), 

where C(mi, m 2 ,pi,p 2 ) := 32 max /(a; — nik) A pi{x)dx. 

(fc,i)e{i,2} 2 7 



Proof: 
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We have 

E [(X t - EpQ)) 4 ] 
= E (Xi - w\(i)mi - cj 2 (i)m 2 ) 4 
< 8 Wl (i) 4 E ({Xi - rrn) 4 ) + 8 u; 2 (i) 4 E ((J*Q - m 2 ) 4 ) 

= 8 u>i(i) 5 / (x — m\) 4 p\(x) dx + uii(i) 4 u} 2 (i) / (x — mi) 4 p 2 (x) dx 



+ uj 2 (i) 4 uji(i) / (x — m 2 ) 4 pi(x) dx + uj 2 (i) 5 / (x — m 2 ) 4 p 2 (x) dx 



< 32 max / (x — m,k) 4 pi(x)dx. 
(M)e{i,2}V 



1 ™ 

Lemma 5 For any 1 < I < 2, the estimator rhi = —} a/(z)Xj of 

n ' 

i=l 

consistent, that is 



mi zs 



Proba 



Proof: 



Let e > and I G {1,2}. We have, using Bicnayme-Chebyshev inequality and 
Lemma [T] 



P(\mi - mi\ > e) 
1 " 



i=l 



1 ™ 



< 



ne 2 K 



ie{i,2} 



ie{i,2} 

(mi + m 2 ) 2 



Last inequality is obtained by using assumption on the smallest eigenvalue of 
fi'O, that is larger than Kn (with K > 0) and the fact that the supremum value 
for x G [0, 1] of x — > x(l — x) is equal to |. The right-hand side clearly tends to 
when n goes to infinity. We conclude that rhi is consistent. 



Proposition 1 For any I G {1,2}, any n G N* and any 1 < i < n consider 
W$, defined as in (15J). 

n 

4^(0,1). 
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Proof: 



We apply Theorem 4.2 in Petrov [BJ. It is the general setup for the Central 
Limit Theorem, for the triangular array of series (W n i)i tn of independent ran- 
dom variables Xi (that are not identically distributed). 

If the three conditions are satisfied for any e > and any r > 

n 

1 i£ p (i^i^)=°. 



2. lim / vM m (v)=0, 

i = l J \v\<-r 

3. lim V J / y 2 dF (y) - 

„_i J v\<t m 



y\<T 



ydF (y) 



then £W#4JV(0,1 



Let us prove that the three conditions arc satisfied. Let e > 0. Using Bicnayme- 
Chebyshev inequality, 



n 

E p (i 



|W#|>e) < e" 

i=l i=l 



E E K } ) i{i^i>4 

Hence, condition 1 is clearly satisfied by using Lemma [3] 

Let us move to condition 2. We use the same trick as above. For any r > 

E / , y dF w « (y) = E / &) - T ^ (») 

j=l" / ltfl<T " ; i=l L J n< " , |y|>T " ; 

The first summand is equal to as the variables Wj^ are centered. 



E 



\y\>r 



VdF,, (y) 



< 



n „ 

E/ \y\ dF w «M 

i=l J \v\>T 



< T 



n r 9 

4 E E M i{iwi?i>r} 



Condition 2 is clearly satisfied by using Lemma [3J 

We end the proof with condition 3. There are two parts (because of two sum- 
mands) in this condition. For the first part we proceed exactly as in condition 
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2. Indeed we have 



E / , y2dF ^> {y) ^J y2dF ^> {y) -t.[ {> y 2dF K 

i=1 ->\y\<r m i=l "* »=1 J l!/I> T 



(2/)- 



The first summand is exactly equal to 1 and the second one tends to as n 
goes to infinity, according to Lemma [3l Therefore it remains to prove that the 
second part tends to when n goes to infinity. Because the variables W^} are 
centered and according to Cauchy-Schwarz inequality: 



E 

i=l 



\y\<T 



ydF, l} (y) 



E 



\y\>i 



ydF, l} (y) 



< 



n „ 

E/ y 2 * F w M 

i=i J \v\> r 

n r 2 



> T 



Still using Lemma[3J we conclude that the second part we are interested in tends 
to when n goes to infinity, as expected. 
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